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Abstract 

The principle of adaptation in a noisy retrieval environment is 
extended here to a diluted attractor neural network of Q-state 
neurons trained with noisy data. The network is adapted to an 
appropriate noisy training overlap and training activity which are 
determined self-consistently by the optimized retrieval attractor 
overlap and activity. The optimized storage capacity and the 
corresponding retriever overlap are considerably enhanced by an 
adequate threshold in the states. Explicit results for improved 
optimal performance and new retriever phase diagrams are ob- 
tained for Q = 3 and (5 = 4, with coexisting phases over a wide 
range of thresholds. Most of the interesting results are stable to 
r eplica-symmet ry-breaking fluct uat ions . 



PACS numbers: 87.10.+e: 64.60. Cn 



1. Introduction 



Since the pioneering work of Hopfield , there has been much interest in both 
the training and performance of attractor neural networks. Training consists 
in encoding an appropriate synaptic matrix that enables the network to store 
a macroscopic number of patterns, while the performance of a network refers 
to the ability to retrieve one or a specific set of stored patterns 0. Training 
and performance are usually thought of as separate stages in the operation 
of a network. 

The retrieval performance of an attractor network can be studied in two 
different scenarios . One is characterized by a fixed synaptic prescription, 
as in the case of the Hopfield model [Q] or the maximally stable network 
(MSN) [4-6] , while in the other one, the entire space of synaptic interactions 
is searched for optimal performance whenever there is a change in the re- 
trieval environment. The synapsis in the first scenario are determined in an 
ordinary learning stage and the performance of the network is optimized sep- 
arately in a given training environment. In the second scenario one resorts to 
a continuously going on adaptive training process in which the network per- 
formance is optimized in an adiabatically evolving retrieving environment 
For each value of the noise parameter T (temperature of the retrieval dynam- 
ics), and storage ratio a, the network has a unique interaction configuration, 
the so-called retriever. This is in distinction to the retrieval performance 
that yields the phase diagrams for the Hopfield model or the MSN, in which 
the interaction configuration determined in the separate learning stage is the 
same for all T and a [0. 

Adaptive training processes seem to be biologically appealing as a mean 
to learn from the environment. The adaptive process in the second scenario 
requires training the network with noisy patterns p, and it is a procedure 
that does not separate the training process as a distinct step from the operat- 
ing stage of the network. The principle of adaptation in a network of binary 
units consists in the search of the interaction space for the optimized network 
performance adjusting the training noise to be the same as the retrieval noise 
in each step of the adiabatically evolving retrieving environment. Both noises 
refer to the Hamming distances between the actual states of the network and 
the encoded patterns. 



Training noises have been introduced in feedforward networks |T0, 111 



order to avoid overfitting to training examples and in attractor networks with 



1 



the purpose of enlarging their basins of attraction Q . A shghtly distorted 
set of random patterns is presented to the network in the process of encoding 
the synaptic matrix by means of a stepwise updating procedure following 
the perceptron learning rule 0. The MSN is generated by an infinitesimal 
amount of training noise and, except for low retrieval noise T and low load 
a, the performance of the optimally adapted network is clearly superior to 
that of the MSN In particular, for low to moderate T and higher load a, 
a second optimal solution in interaction space appears for each value of the 
training noise in the optimally adapted network. This solution is a weaker 
retriever which can be interpreted as an attractor of self-adaptation. 

The point is that the second retriever constitutes a further solution to 
the optimization process, with its own interaction, in a neighborhood of in- 
teraction space where there is no solution for the MSN. This second, optimal 
solution, appears as a low performance solution in the absence or for low to 
moderate retrieval noise, with improving performance, up to a certain point, 
as the retrieval noise is raised. Thus, there has to be already a certain level 
of retrieval noise for the weak retriever to have an interesting performance. 
Moreover, whenever the solution exists it is only within a narrow range of a. 

The principle of adaptation has been worked out, so far, only for a net- 
work of binary neurons and the purpose of the present paper is to explore 
the merits of an extension of the principle to a multi-state attractor network 
in which both the neurons and the noisy training versions of the encoded 
patterns can be in Q{> 2) states. This adds two new dimensions to the 
study of the performance of the network. First, the randomly distributed 
noisy patterns presented to the network in the training process introduce a 
training activity at. Second, the firing rate of the neurons is determined by 
one or more thresholds, or a growth parameter in the dynamical output func- 
tion. Thus, in the extension considered in this work, an evolving dynamical 
overlap m(r) and a dynamical activity a(r) are generated at each time step 
r of the neuron updating procedure. The search for the optimized network 
performance by means of the extended adaptation principle consists now in 
the adjustment of the training overlap and the training activity at to be 
vnt = m{T) and at = ^(t), respectively, i.e., the same as the retrieval overlap 
and dynamical activity in each step of the adiabatically evolving retrieval 
environment. Adaptive performance in this wider sense is a self-consistent 
procedure in which the retrieval environment continuously optimizes the at- 
tractor performance of the network. 
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Networks of multi-state neurons have interesting features and applica- 
tions. Feedforward networks of such units can be used to study multi-class 
classification problems [|I^, while multi-state attractor networks, which are 
useful for the recognition of various grey-toned patterns, are networks that 
have interesting inferential properties, by means of which the storage capac- 
ity and the retrieval ability can be enhanced when they are trained with 
patterns of low activity [13-16]. Also, the categorization ability can be im- 
proved in a multi-state network with hierarchical patterns. There has been 
lately considerable interest in such networks [17-19]. 

We consider an extremely diluted network and, for simplicity, restrict 
ourselves to binary unbiased encoded patterns. The main emphasis of the 
paper is on the storage capacity, the quality of the performance of the strong 
and the second retrievers and on the characterization of the various phases 
that can appear. With that purpose we produce explicit results for a network 
with Q = 3 OT Q = 4 states. It will be shown that, within a finite range 
of a threshold parameter, there is a considerable improvement of the storage 
capacity and in the high performance of the second retriever solution, in 
the absence or for low retrieval noise, when compared with the optimally 
adapted network of binary neurons. In particular, we show that the second 
retriever may attain a fairly high retrieval overlap for small training noise 
in a regime where there is no solution for the optimally adapted network of 
binary neurons. These are important results in the search for improvement of 
the behavior of attractor neural networks. We restrict ourselves to finite-Q 
state networks, in place of addressing the general (large-Q) case. 

The outline of the paper is the following. In section 2 we extend the 
training with noise procedure in the space of synaptic interactions to a Q- 
state Ising network by means of a quenched optimization approach [0, ^] , 
within the replica-symmetry Ansatz, introducing a smooth cost function 
given by an average squared Hamming distance. The equations for the adap- 
tation process in a noisy retrieval environment are formulated in that section. 
The explicit results for the fixed-point behavior, the storage capacity and 
the corresponding phase diagrams for self-adaptation for the three and the 
four-state models are discussed in section 3, and compared with the MSN. 
The domain of validity of the replica symmetric results is determined by 
the de Almeida-Thouless lines [^] in terms of the retrieval noise and the 
threshold in the dynamical updating procedure. A summary and concluding 
remarks are presented in section 4. 
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2. Training with noise and adaptation 



Consider a network of N nodes with a dynamical variable Si{T), at time step 
r on node i, that indicates the extent to which the unit on node i fires. Each 
unit can be in any one of Q Ising states 

2(k - 1) 

in the interval [— 1,+1], for k = 1, . . . ,Q. A macroscopic set of p binary 
patterns {.^f = ±1; /x = 1, . . . ,p; i = 1, . . . , N}, with p = aC, is encoded in 
the network in the learning process, where C is the connectivity of a node. 
The patterns constitute a set of independent identically distributed random 
variables. Training consists in presenting to the network a noisy version 
(r)} of the patterns, at time r, and in the optimization of the network 
output after one time step. This involves a dynamical process in the space 
of state configurations of the network and, to keep the dynamics simple, we 
restrict ourselves to an extremelly diluted network. Each -Rf (r) is assumed 
to be in one of Q states, 0"^, and can be thought of as an example of the 
pattern ^^^j^. Assuming that every noisy pattern has the same overlap rrit with 
the corresponding pattern and that the activity at is the same for all 
patterns in the training set, we define 

rnt-^J2^t{Rt{r))n (2) 

i 

and 

i 

where the brackets {■ ■ ■)r denote averages over the probability distribution 
of Rf. Thus, the noisy training inputs are constrained to satisfy the mean 
(i?r(r))^ = m,er and variance ((i?r(r))2)^ - (r))| = a, - ml 

The normalized local field at node i, due to the activity at the other 
nodes, is given by 

hi{T) = ^f2'^^^^j(^h (4) 

j=ii 

where Jij is the synaptic connection between nodes i and j, independently 
in what state the dynamical variable Sj is, while ii, . . . ,ic denote the nodes 
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feeding node i. The connections follow the spherical constraint Ylj Jfj = C*, 
and we consider the extremely diluted network in the limit of large connec- 
tivity in which 1 ^ C <^ InA^. The one time-step dynamics is exact in this 
limit. 

We deal in this paper with the asymptotic, equilibrium configuration 
{Jij}, for the synaptic matrix elements of the learning process that follows 
from a Langevin dynamics with a noise term. This involves an annealing 
temperature which takes care that the network does not get trapped in 
local minima of the free energy. The distribution of equilibrium states of 
the Jij can then be described by a canonical ensemble with temperature T^. 
Thus, there are two time scales in this approach: a short-time scale for the 
dynamical evolution of the synaptic matrix {Jij} and a long-time scale for 
the dynamical evolution of the training and of the retrieval parameters. 

The dynamical variables are updated according to the rule 

S,iT+l)=gihiiT)), (5) 
where gihiij)) is the non-decreasing step function 
Q 

9{x) = XI + ^fc) -x) - d{h{(Jk + o-fc-i) - x)] (6) 

k=l 

shown in Figure 1 for Q = 3 and Q = 4 in which 9{x) is the unitary step 
function, ctq = —go, ctq+i = oo, 6 > is the threshold parameter and are 
the uniformly spaced Ising states of Eq. (P. According to Eq. (|^), there is a 
zero activity state whenever Q is odd and none if Q is even. 

For the adapted optimization a temperature T is introduced as a noise 
parameter, not to be confused with the annealing temperature T^, to charac- 
terize the noisy retrieval environment. We assume a Gaussian thermal noise 
term added to the local field to write the one-step output as 

S,{r + l)=g{K{r)+Tz) (7) 

where z has mean zero and unit variance. The optimization, in the extremely 
diluted limit, consists in penalizing deviations from the minimal output error 
in one time step on any node which is independent of the optimization on 
all the other nodes. Thus, it is sufficient to consider the cost function for a 
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single node. We choose this to be 

E + 1) = E (( [1 - 2e^^(^ + 1) + ^'(^ + 1)] )^)r ' (8) 

where (if (r + 1) is the average squared Hamming distance to a stored pattern 
in which {■ ■ ■)z denotes the average over the Gaussian thermal noise. The 
training noise enters only through the local fields, via Eqs.(|^) and (0). In the 
case of binary patterns, the local field is a Gaussian random variable with 
mean {hi{T))R = rritA.^ and variance {hJi_{T)) r~ — (^t — fnh which 

= 4 E ^^.e; (9) 

j=ii 

is the local field on node i due to the pattern /i. 

The optimization of the Hamming distance between the one-step output 
of the network in the noisy training environment and a given pattern in a 
network of binary neurons is equivalent to finding the optimal output overlap 
after one time step. In the case of a network of multi-state neurons, the 
Hamming distance also depends on the activity through the local field, and 
our first goal is to find the optimal output Hamming distance d{mt, at), after 
one time step, for a given training overlap and a given training activity. 

For that purpose, and for later use, we need the averages 



SmuaAK) = {{S^ir+l)).)R = ^ ^ ^^'E^ K , ^fc; Af ) (10) 

and 



2 

k=l 



1 



2 

k=l 



which follow from Eq. (|^), where 



Erffn / A) = erf f ^ - _ \ _ / I - m^A _ \ 

' ' ' ' \^2{at-mj + T^)J \^2{at - + T^) ) ' 

with 

Mfc/26 = (jfc + l/((5 - 1) ,mq = oo (13) 
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and 



ik/2h = (Tfc - i/(g 



The quenched optimization approach 
the partition function 



l),/i = -oo. (14) 
P, EOi requires the introduction of 



Z{(3) 



(15) 



to obtain first an annealed average over the space of synaptic connections 
Jij, in which /5 = is the inverse anneahng temperature, and d{A^) is 
the squared Hamming distance, for a given configuration {^f } of encoded 
patterns, averaged over thermal and training noises. Its dependence on the 
noise parameters rrit, at and T is left implicit. The quenched average free- 
energy is then obtained making use of the replica method to write 



(InZ). = lim-((Z")£- 1) 



(16) 



where (. . .)^ denotes the average over the set of stored patterns {^f }. Us- 
ing the standard technique in the space of synaptic interactions, with the 
assumption of replica symmetry |2^, we obtain the optimal one-step 



output Hamming distance for training 



d{mt,at) 



1 



1™ 

/3^oo af3C 



(17) 



extr 

X 



Dy min F(A, x, y) 

A 



2ax 



as a function of the overlap and activity at of the noisy input patterns, 
in which = e"^ /^d?//\/27r is a Gaussian measure and 



F{X,x,y) = d{X) + 



2x 



(18) 



Here, d{X) is the squared Hamming distance averaged over , while x = 
/3(1 — q) and 

^ = ^E4'^5 (19) 
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for all p ^ a is the spin-glass order parameter for the problem. 

The optimization in the training process amounts to take the limits (3 — >■ 
oo and g — > 1 keeping x finite. A single solution in the space of interactions 



is thus obtained out of the full multiplicity of solutions when g — 1 |2^. The 
minimization with respect to A yields 

y{X) = X + xd'{X) , (20) 

where A = X{y) is the inverse function of y{X). On the other hand, the 
extremum in x gives the saddle-point equation 

a-' = j Dy[X{y)-yf , (21) 

which determines the storage capacity a for a given training environment. 

In cases where X{y) is a multivalued function of y, which is the case for 
Q > 2, there may be one or more transitions, each with a fixed yo between 
an upper and a lower value A> and A<, respectively, ruled by a Maxwell 
construction 

dA yiX) = yo(A> - A<) (22) 



where yo = 1/(A<) = ?/(A>). It turns out that the function F{A,x,y) is the 
same on both sides of the "first-order" transition. 

The optimal output Hamming distance for training becomes then 

d{mt,at) = J BydiXiy)). (23) 

It is convenient to introduce the distribution of the local fields due to the 
encoded patterns, defined as [3-5] 



p(A)= 5 A--=Vj,^- (24) 





where the ensemble average {■ ■ ■)j is performed with the partition function 
Eq. ([T5|). It turns out that this distribution becomes 

p(A) = jDy 6{A - X{y)) , (25) 
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and the transition between the lower and upper bonds, A< and A> respec- 
tively, implies a gap in the distribution of local fields p(A) whenever X{y) is 
a multivalued function of y. 

The optimal one-step output Hamming distance for training with noise 
may now be written as 

(i(r + 1) = 1 - 2f^,^at {m, at) + gmt,at i^t, at) (26) 

where 

{A)S^,a{A) (27) 



is the optimized overlap between the encoded patterns and their noisy ver- 
sions and 



I dA pm,,aMSl,{A) . (28) 



is their optimized activity. The distribution of the local fields, pmt,at(A), is 
a characteristic property of the training set and, as such, it depends on 
and at- 

The formal results presented so far assume that replica symmetry holds 
in the space of interactions. The condition for local stability of the replica 



symmetric saddle-point can be writen as [20, 24 



a-' > jDy [\'{y) - if , (29) 

in which A' = dX/dy, and this is to be solved together with Eq. (p]). When 
the distribution of the local fields has a gap. A' diverges and the condition can- 
not be satisfied. Then, the network becomes unstable to replica-symmetry- 
breaking fiuctuations. The limiting load for which Eq. (^) is still satisfied 
yields the de Almeida-Thouless (AT) line, aAxiT) ||2l|. The dependence on 
the retrieval noise T comes from A. Note that the AT line must lie within the 
one-band region or, at most, on the band-merging surface where the gap in 
p(A) disappears ||2^. This completes the formal description of the training 



process in itself. In order to become optimally adapted, we consider now the 
retriever process. 

The calculation of the one-step output Hamming distance between any 
input state {S'i(r)} and a given encoded pattern in a noisy retrieval environ- 
ment, with temperature T, is now obtained as follows. First, the training 
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parameters rrit and at in Eqs. (|T0|)-(p!2D are replaced by the overlap m(r) and 
the dynamical activity a(r) of the noisy retrieval state {S'j(r)}, expressed re- 
spectively as Eqs. (Q) and with {S'j(r)} in place of the the noisy pattern 
(r)}. The one-step output Hamming distance in the retrieval environ- 
ment is now given by an expression similar to Eq. (^), depending on the 
pair {mt,at) through the distribution of local fields and on the pair (m, a) 
through the present state of the network as given, literally, by Eqs. (|27|) and 

Now, the training overlap rut and the training activity at which give the 
optimal performance for retrieval at a fixed temperature T, storage level 
a and threshold parameter b, are given by the adaptation principle. The 
optimal adaptation consists in a search in the space of interactions {Jij} 
simultaneously with a search in the space of state configurations {S'j(r)}. 
The best adapted performance of the network is attained by adjusting the 
training noise and activity to the same level as the retrieval noise and activity. 
For the parallel dynamics in the extremely diluted network we are dealing 
with, the stable fixed point of the set of equations 

fm,a{m,a) = m (30) 

and 

gmA^^(^) = « (31) 

gives at the same time the optimal training condition and the optimized 
performance. The stable fixed point for each value of the synaptic noise 
parameter T, the storage ratio a and the threshold parameter 6 is a retriever, 
for which the network has a unique interaction configuration. In other words, 
in distinction to the usual phase diagrams for retrieval, every point of the 
phase diagrams that will be discussed next represents a different network. 

3. Results and discussion 

We present next the results for the optimally adapted retrievers. The rich 
structure of locally stable states and the corresponding phase diagrams for 
self-adaptation that arise as the threshold parameter b is increased will be 
discussed now, separately for Q = 3 and Q = 4. 
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3.1. Three— state network 



To illustrate the role of the threshold parameter 6, we discuss first the fixed- 
point solutions for m and a and the corresponding phase diagram for a vs. 
6, in the absence of retrieval noise shown in Figure 2. For fixed h within the 
range < 6 < 0.57 and < a < en (6), there is a perfect retriever with 
m = 1 = a which is the only stable fixed point, and a solution with m = 
and either a 7^ or a = 0, which is an unstable fixed point. This suggests 
that one can conceive a network capable of perfect retrieval operating with a 
limited threshold, as long as the training is with infinitesimal noise mt = 1~ 
and almost full activity at = 1~. The corresponding retriever is that of the 
MSN. 

The line ai{b) deserves further attention. It is the upper bound of the 
region where the perfect retriever is the only attractor in the retriever dy- 
namics with a wide basin of attraction for self-adaptation. Beyond that line, 
the basin of attraction of this retriever is greatly reduced in the three-state 
network, as will be seen next. Thus, for increasing b, in the small b regime, 
there is an enhancement of the associativity of the network, as long as ai is 
an increasing function of b. 

A new pair of stable and unstable fixed points appears discontinuously at 
ai{b). The stable fixed point represents a new retriever of weaker attractor 
overlap and reduced activity. Note, however, that for low to moderate b 
(illustrated in the inset by 6 = 0.5), there is a considerably enhanced retrieval 
overlap when compared with the overlap for the optimally adapted network of 
binary units Qj. The second retriever has a rather wide basin of attraction 
for this larger retriever overlap. This higher performance can be attained 
through training with low-noise patterns with moderately high activity. For 
the threshold b ~ 0.3 that maximizes en (6), the improvement in storage 
capacity with the same retrieval overlap as that of the network of binary 
neurons is about 20 %. However, as one would expect, the performance 
deteriorates with a further increase in the threshold b. 

The second stable fixed point means that there exists a second training 
condition, with higher noise, which results in a network with lower, but still 
optimal performance when compared with other three-state networks in its 
vicinity of the space of interactions, for this training condition. The unstable 
fixed-points are repelors of the self-adaptation dynamics 0]. 

The overlap of this second retriever vanishes continuously as a increases 
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approaching a2{b). For a2{b) < a < adb), the perfect retriever and a non- 
retriever with m = 0, and either a finite or no activity, are the only stable 
fixed-point solutions. The non-retriever state with a 7^ appears as a self- 
sustained activity phase, which has been discussed first for a diluted network 



with a Hebbian learning rule [|1J]. When the activity is zero the network 
stops operating. 

The presence of a non-retriever with finite activity follows from the fixed- 
point solution for (m, a) when m = is a stable fixed-point. The expression 
for „(A) becomes then independent of the local field A and, hence, of x and 
a. The fixed-point values for a are then given by the solutions of the equation 
a = 1 — eii{b/ \/2{a + T^)). The solution a = is stable for all 6, when 
T = 0. There is a second stable fixed point that decreases monotonically 
from a = 1, at 6 = 0, and disappears discontinuously at h ~ 0.57 when the 
value a ^ 0.23 is reached. This is the origin of the "tricritical" point in 
the phase diagram for a vs. 6, where the line of continuous transitions for 
the overlap becomes discontinuous. We come back to this point below. It 
is important to remark that the term "transition" here only means that the 
network changes from one retriever state to another one. We remind that it 
is not meant as an usual thermodynamic phase transition, since each point 
of the phase diagram corresponds to a different network. 

Finally, when a reaches the critical storage capacity ac{h), given by 

a-\h)= f T^y{h-yf (32) 

J —00 

the perfect retriever is destabilized. 

Consider next the case where 0.57 < b < 0.82. For < a < ai{b), 
there is again a perfect retriever which is a stable fixed-point solution. In 
addition, a pair of stable and unstable fixed points appears. The stable 
fixed point is a non-retriever with m = and either a 7^ or a = 0. 
A new pair of stable and unstable fixed points appears discontinuously at 
ai{b). The stable fixed point is, again, a retriever of weaker attractor overlap 
and reduced activity. However, as a approaches a2{b), this second retriever 
vanishes discontinuously and, thus, there is a changeover from the line of 
continuous transitions a2{b) when b increases and reaches a tricritical point at 
b ~ 0.57. When a increases beyond a2{b), the perfect retriever and the non- 
retriever are, again, stable fixed-point solutions, and the perfect retriever. 
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which has a narrow basin of attraction, is destabihzed when the critical adb) 
is reached. When b is increased, the retriever of weaker attractor overlap 
disappears at b ~ 0.82, and beyond this point the perfect retriever is the 
only stable fixed point with finite overlap for < a < adb). 

Now we discuss the stability of the replica symmetric solution. First, the 
strong retriever state is always stable to replica-symmetry-breaking fluctu- 
ations below Oc- Thus, at most the weak retriever can become unstable. 
In view of this, we mapped out the region of the phase diagram where the 
stability condition, Eq. (^), is not satisfied for the weak retriever state, and 
this is shown as the shaded area in Figure 2, the dash-dotted line being the 
AT line. Furthermore, we found that this line corresponds to the appearing 
of a gap in the distribution of local fields. 

The phase diagram also yields the optimal basin boundary of the self- 
adaptation dynamics for a given a. Thus, as a increases for b < 0.82 the 
strong retriever is a "wide" retriever for a < ai{b), since it is the only 
attractor in the self-adaptation dynamics. For a > ai{b), the strong retriever 
becomes a "narrow" retriever which coexists with the weak retriever. Finally, 
for b > 0.82, the strong retriever is a narrow retriever that coexists with the 
non-retriever state for all a < adb). 

We consider next the results in the presence of retrieval noise T. In the 
case of a small to moderate threshold where the strong and weak retriever 
coexist, say, for b = 0.5, the phase diagram for T vs. a is not very different 
from the phase diagram for the network of binary units. The strong and the 
weak retriever coexist now over a wider range of a but the strong retriever 
disappears, as one would expect, for a lower T. More interesting are the 
results for the phase diagram and the underlying fixed-point solutions for 
the overlap and the activity when 6 = 1, shown in Figure 3. This threshold 
is typical of an optimally adapted network that has a perfect retriever as the 
only stable fixed point with non-zero overlap at T = 0. For fixed and low 
T < 0.4, there is a strong retriever with rapidly decreasing m and a when a 
comes close to the line adT), where both parameters vanish discontinuously. 
There is a second stable fixed point with m = and a ~ 0, for all a > 0, and 
this non-retriever is the only stable solution for a > ac{T). There is also an 
unstable fixed point for m and a throughout the range < a < a^T) that 
separates the basin of attraction for self-adaptation of the two stable fixed 
points, and indicates that the strong retriever is a narrow retriever in this 
interval. 
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An increase in retrieval noise can be of use for the enhancement of the 
performance of the single, strong retriever, with a moderately large threshold, 
as in the present case of 6 = 1. Indeed, for 0.4 < T < 0.8, the non-retriever 
becomes an unstable fixed point for a below the line ai{T), leaving the 
strong retriever as a wide retriever. The overlap and the activity change 
discontinuously as a goes through ai{T). For T > 0.8, the overlap of the 
wide retriever vanishes continuously as a approaches ac{T). The results 
shown here confirm the general expectation that one cannot attain the best 
retriever overlap (as we have here for the narrow retriever) together with the 
best associativity, as for the wide retriever, in the same network except at 
the phase boundary. 

The AT line coinciding with the locus where the gap closes down is also 
shown in Figure 3, and the region to the right of the line up to the etc line 
is stable to replica-symmetry-breaking fluctuations. Thus, it seems that the 
part of the discontinuous transition line adT) that is close to the tricritical 
point where the changeover to the line of continuous transitions takes place, is 
marginally stable. We also argue that for low T the line ac{T) may be almost 
correct, since ac{0) is the critical capacity of the MSN, which corresponds 
to a stable point. Note that the line ac{T) of discontinuous transitions has 
an upper part of infinite slope which should also be correct since one would 
not expect a reentrant behavior for adT). Finally, for comparision, we also 
show the phase boundaries for the MSN and conclude that the optimally 
adapted network with three-state neurons has an improved performance in 
the presence of retrieval noise. 

3.2. Four— state network 

To see now the effects of the threshold in the optimally adapted four-state 
network, we present first the results for the fixed-point solutions for the 
overlap and the activity in Figure 4. Depending on the value of b there may 
be a domain in the values of a in which there are up to three stable fixed- 
point solutions with non-zero m, one for a perfect retriever and the other 
ones for weaker retrievers. The perfect retriever exists up to a critical adb) , 
given by 




(33) 
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It turns out that there is a load for all b, where a weak retriever, 

which may or may not be the only one, appears discontinuously as a attains 
that point. For b < 0.65, it is the only weak retriever, as can be seen in the 
phase diagram for a vs. b shown in Figure 5. Note that, also for the four 
state network, ai increases with b in the small b regime with a considerable 
enhancement of the strong retriever as a wide retriever. The perfect and 
the weak retriever coexist with increasing a until either the weak retriever 
disappears continuously at a2{b), which is the case for b < 0.44, or the strong 
retriever ends at adb) for 0.44 < b < 0.65. In the latter case, the weak 
retriever of non-zero overlap remains as the only attractor of self-adaptation 
up to a2{b) > ac{b). 

On the other hand, for b well above 0.65, a second weak retriever (WR2) 
appears discontinuously as a attains the line a4^{b) while the first weak re- 
triever (WRi) extends up to a quite higher load 0:3(6), where the state of 
the network changes discontinuously to the non-retriever state. The overlap 
of the WR2 vanishes continuously as a approaches a2{b). The two weak re- 
trievers coexist for a4,{b) < a < a2{b). Note that both the line where the 
first weak retriever disappears and the domain of a where the second weak 
retriever exists may lie well above the critical capacity CKc{b) for the existence 
of the perfect retriever. 

The situation can become more involved for intermediate values of b, 
shown by the inset in Figure 5. Around the endpoint C of the wedge of 
discontinuous transitions lines, the second weak retriever can be reached 
continuously from the first one. 

It is interesting to note that, for large &, the WRi state has an asymptotic 
overlap and activity m ~ 1/3 and a ~ 1/9, respectively. These correspond to 
the storage of binary patterns in a network with only the microscopic states 
Si — ±1/3 being activated. These are, practically, the only states favoured in 
the high-6 regime, since the states Si — ±1 can only become active by means 
of high local fields which are extremely unlikely in the absence of retrieval 
noise. Indeed, we found that the line 0:3(6) goes to the critical value Oc = 2 
for the optimal network of binary units with increasingly large b. Thus, as 
expected, the behavior of the network in the large-6 limit should become 
that of the MSN with reduced overlap and activity. 

The phase diagram in Figure 5 also provides the optimal basin boundary 
of attraction, for a given o and b. For 6=1, say, the strong retriever is a 
wide retriever for o < oi(6), and a narrow retriever when oi(6) < o < Oc(6). 
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On the other hand, in the interval adb) < a < a^lb), the weak attractor 
with higher overlap is a wide retriever, since it is the only attractor for 
the self-adapting dynamics in this interval. In distinction, in the interval 
a4^{b) < a < a^lb) that weak retriever is a narrow retriever, that coexists 
with WR2 if a < a2{b) and with the non-retriever state otherwise. 

To discuss the validity of the replica symmetric results note that, when- 
ever two weak retrievers coexist in the phase diagram, each one has to be 
analyzed separately since they refer to different levels of training noise, such 
that one may correspond to a gapless local field distribution and the other 
may not. The AT line is the dash-dotted line shown in Figure 5, that starts 
on the boundary ai{b) where the single weak retriever appears for small b 
and it merges with 04(6) around b = 0.8. That retriever is stable to replica- 
symmetry-breaking fluctuations above the AT line. The WR2 is unstable 
around C and is stable in the strip 04(6) < a < a2{b), whereas the WRi is 
unstable everywhere below and at the boundary 03 (6). The left part of the 
boundary ai{b) is marginally stable, as well as the boundary adb) for the 
perfect retriever. 

4. Summary and concluding remarks 

The principle of adaptation, formulated earlier for a network of binary neu- 
rons, has been extended in this work to study the training and performance of 
optimally adapted attractor neural networks of multi-state neurons trained 
with noisy inputs in the presence of a noisy retrieval environment. Explicit 
results where obtained for the optimal attractor overlap and the optimal 
dynamical activity as functions of the retrieval noise T, the load a and the 
threshold b, for a network with dilute connectivity. The maximum storage ca- 
pacity was also obtained as a function of b and T and explicit retriever phase 
diagrams of performance and associativity of the retrievers are exhibited for 
a network of three or four-state neurons. These are phase diagrams for self- 
adaptation, in distinction to phase diagrams for attraction, as pointed out in 
ref. 0. We remind the reader that, as pointed out by Wong and Sherring- 
ton, coexisting retrievers are solutions for different networks, which should 
correspond to distinct synaptic interactions. 

An important issue of this work concerns the improvement in the associa- 
tivity of multi-state networks, when the width b of the intermediate states 
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increases, in the small b regime. The enhanced performance of the second 
retrievers has also been emphasized. This is important because they are op- 
timal retriever solutions on their own, rather than weaker retrieval solutions 
for the optimal network configuration, if such solutions exist 0. We have 
shown that an improvement of the performance of the second retriever in 
the optimally adapted network with multi-state units can be attained with 
relatively small training noise and large-activity input patterns. In practical 
terms, this may be a more accessible situation than training with an infinites- 
imal amount of noise and almost full activity. Furthermore, we have shown 
that the storage capacity of the second retriever is a non-monotonic function 
of the threshold b with an increasing capacity for small b. With a moder- 
ately large threshold, as in the case of 6 = 1 for the three-state network, an 
increase in retrieval noise T may help to enlarge the basin of attraction of 
the single, strong retriever. This can be understood noting that the increase 
in the noise should aid to overcome the large gap in the local field in firing 
the units when the network has been trained with a moderate training noise. 
These are important results in the search for improvement of the behaviour 
of attractor neural networks. 

The work presented here is restricted, for simplicity, to binary encoded 
patterns. On the basis of results we obtained for three or four-state patterns, 
we argue that this should not be a serious restriction. What is important is 
that the states of the noisy training set {-Rf (t)} have the same degrees of 
freedom as the arbitrary input set {S'j(r)} for retrieval. This requires the 
introduction of a training activity at in the noisy inputs, in order to optimize 
both the training and the adaptation process in the Q-state network. 

We have found, in accordance with earlier works, that networks are spe- 
cialized 0, 0. Indeed, one cannot attain the best storage capacity for all T 
and 6 in a single network. Even if b is fixed the storage capacity of the strong 
retriever will be that of the MSN only at very low T and it will become that 
of the Hopfield model at high T. 

All the results were obtained with the assumption of replica symmetry 
in the space of synaptic interactions and the limit of validity of this assump- 
tion has been established finding the de Almeida-Thouless lines aArib) at 
T = and a^riT) for a given b. These lines coincide with the band-merging 
lines for the distribution of the local field. Due to the presence of optimal 
solutions for small-to-moderate training noise, there are gaps in the distri- 
bution of the local fields over sizeable domains of the phase diagram which 
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are not stable to replica-symmetry-breaking fluctuations. Nevertheless, in- 
teresting phase boundaries and domains of the phase diagrams are stable or, 
at worst, marginally stable, confirming the validity of our results. Indeed, 
the enhancement of the line ai{b), where the second retriever appears for 
small training noise and large activity, both for Q = 3 and Q = A, lies on 
the replica symmetric side of the AT line. Furthermore, the interesting weak 
retriever lies completely on this side. That is also the case for the tricritical 
point and the first-order transition line, CK2{b), for the three-state network, 
which at worst becomes marginally stable. Furthermore, the phase diagram 
for T = T{a) reveals that the line a2 of continuous transitions is stable to 
replica-symmetry-brcaking fluctuations, for both Q = 3 and Q = 4 and all b. 
In view of these results, it does not seem worthwhile to pursue a calculation 
beyond the replica-symmetry Ansatz. 

A closer look at our results reveals that although the critical capacity etc, 
where the strong retriever terminates, decreases faster with increasing b for 
the four-state than for the three-state network, the trend is opposite for the 
lower and upper critical storage ratio cti and 0:2 respectively, for the presence 
of a second retriever in the low-6 regime. This suggests that the role of the 
threshold could become even more important in optimally adapted higher 
Q-state networks. The extended principle of adaptation of the present work 
assumes that both, the training overlap and the training activity become 
continuously adapted to the noisy retrieval environment. In particular, the 
training activity follows the changes in the dynamical activity characteristic 
of the Q states of the units, and this makes difficult the study of the opti- 
mally adapted network for general Q. It may be possible to study a weaker 
version of the extended adaptation principle for the graded response network 
in which the training activity remains fixed. This, and other questions, will 
be considered in future work. 
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Figure captions 



Figure 1: The non-decreasing step function g{x) for Q = 3 (a) and Q — 4: 

(b). 

Figure 2: Phase diagram for the load a as a function of the threshold b for 
Q = 3, at T = 0, and the corresponding optimal overlap m (solid lines) and 
activity a (dashed hues) for b — 0.5 (right), b — 0.7 (center) and b — 0.9 
(left), in the inset. Unstable fixed-point solutions are shown in light lines. 
SR and WR arc strong and weak retrievers, respectively. The SR is a wide 
retriever at the left of the light dotted line and below ai{b). Solid lines in the 
phase diagram indicate discontinuous transitions and a dashed line a contin- 
uous transition. The dash-dotted hue is the de Almeida-Thouless line (cf. 
the text). The WR is unstable to rephca-symmetry-breaking in the shaded 
area. 

Figure 3: Phase diagram, for T vs. a, for Q = 3 and 6=1. In the inset 
are shown the optimal overlap (solid lines) and activity (dashed lines) for 
T = 0, T = 0.5 and T — 1; the unstable solutions for m and a are in light 
lines. In Ri(2) the strong retriever is a narrow (wide) retriever. NR is the 
non-retriever phase. The dash-dotted line is the de Almeida-Thouless line. 
Figure 4: Optimal overlap (solid lines) and activity (dashed lines) for Q = 4, 
at T = 0, for 6 = 0.6 and b = 0.8. The various a indicate the loads for which 
the optimal solutions appear or disappear, for b — 0.8, and WR, WRi and 
WR2 are weak retrievers. 

Figure 5: Phase diagram for a as a function of b, for Q — 4 at T = 0, 
described in the text. The amplified central part is shown separately. The 
retrievers and the nature (continuous or discontinuous) of the phase bound- 
aries are as in previous figures. The SR, WRi and WR2 coexist in the shaded 
area of the inset. The de Almeida-Thouless fine is the dash-dotted line. 
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